feat: Add Nordic AXON NPU backend for nRF54LM20B#18863
feat: Add Nordic AXON NPU backend for nRF54LM20B#18863petriok wants to merge 1 commit intopytorch:mainfrom
Conversation
Add ExecuTorch backend for Nordic Semiconductor's AXON NPU, targeting the nRF54LM20B (ARM Cortex-M33 + hardware neural network accelerator). Follows the same composition pattern as the Ethos-U backend: reuses TOSABackend for TOSA lowering, then compiles to AXON command buffers via Nordic's compiler library. Python backend (backends/nordic/): - AxonBackend: @Final BackendDetails with TOSA composition - AxonPartitioner: extends TOSAPartitioner with AXON constraint checks - AxonQuantizer: wraps TOSAQuantizer with AXON INT8 defaults - AxonCompileSpec: hardware constraints and SDK path configuration - TOSA-to-AXON compiler bridge with per-op converters - Subgraph naming (content-hash), marker format, header code generation - Operator support checks (FC 2048, Conv 16x16, Pool 32x32, max 2 inputs) C++ runtime (backends/nordic/runtime/): - AxonBackend delegate: marker-based multi-subgraph lookup, profiling API - Op extensions: single-precision sigmoid/tanh CPU callbacks Zephyr integration: - Kconfig: EXECUTORCH_BUILD_NORDIC_AXON (depends on NRF_AXON) - CMakeLists: auto-link executorch_delegate_axon Examples (examples/nordic/): - hello_axon: minimal sin(x) regression (1 AXON subgraph) - multi_layer: chained FC classifier with profiling (1 subgraph) - simple_rnn: RNN with recurrent state — multi-subgraph delegation (2 subgraphs) - Each includes export script, Zephyr firmware, and setup instructions Tests: 48 passed, 6 skipped (require Nordic SDK) Hardware verified on nRF54LM20DK: all three examples produce correct inference output with AXON NPU acceleration. The Nordic sdk-edge-ai (containing the AXON compiler library) is an external dependency, not redistributed. Discovered via SDK_EDGE_AI_PATH environment variable — same pattern as Ethos-U's dependency on Vela. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18863
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below:
|
This PR needs a
|
Lists NCS v3.3.0-preview3 and links to pytorch/executorch#18863 as explicit prerequisites for the deploy notebook. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run PyTorch models on Nordic's AXON NPU. This repository packages the showcase side of the AXON backend work: models, notebooks, and the deployment environment. The backend itself is proposed upstream as pytorch/executorch#18863. Included: - Three hardware-verified showcase models (nRF54LM20DK), each with a PyTorch training path, INT8 PT2E quantization with AxonQuantizer, AxonPartitioner delegation, and Zephyr firmware: * Anomaly detection — autoencoder, 428 params, ~230 µs/inference * Image classifier — 8×8 CNN, 1,508 params, ~680 µs/inference * Keyword spotting — MFCC CNN, 16,332 params, ~1,600 µs/inference - Four progressive Jupyter notebooks covering first principles, each of the three showcase tasks, and end-to-end flash + serial verification on the DK. - Docker environment with everything pre-installed: NCS toolchain v3.3.0-preview3, ExecuTorch with the AXON backend, PyTorch (CPU), SEGGER J-Link, Jupyter Lab. One build + one run to reach a working environment. - Architecture and supported-ops guides covering TOSA composition with the ARM Ethos-U backend, the AxonPartitioner, and the current NPU op set (FC, Conv1D/2D, depthwise Conv, pool, element-wise, ReLU family) plus op extensions for sigmoid, tanh, and softmax. Apache 2.0 licensed. Nordic's sdk-edge-ai is proprietary and mounted by the user at runtime; SEGGER J-Link is downloaded at Docker build under SEGGER's terms. See README for the full third-party table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Created a showcase repository https://github.com/ioteai/axon-ai |
|
Thank you for the PR @petriok, this is super cool! I've been wanting to check out the Axon NPU. Now that ExecuTorch has reached GA, we're working on defining criteria for backends to be hosted in the executorch tree. We'll need to consider the functionality, performance, CI infrastructure and maintainership. For now, the Axon NPU backend might be better off out of tree. Here are a couple examples of out of tree backends: If you're interested in going that route, we can also add links and documentation to your backend repository in the executorch repository and documentation. |
Summary
ExecuTorch backend for Nordic Semiconductor's AXON NPU on the nRF54LM20B (ARM Cortex-M33 + hardware neural network accelerator).
Follows the same composition pattern as the Ethos-U backend: reuses
TOSABackendfor TOSA lowering, then compiles to AXON commandbuffers via Nordic's compiler library.
Python backend (
backends/nordic/):AxonBackend,AxonPartitioner,AxonQuantizer,AxonCompileSpecC++ runtime (
backends/nordic/runtime/):Zephyr integration:
EXECUTORCH_BUILD_NORDIC_AXONKconfig optionexecutorch_delegate_axonin CMakeExamples (
examples/nordic/):hello_axonmulti_layersimple_rnnEach example includes an export script, Zephyr firmware, and step-by-step README.
Nordic's
sdk-edge-ai(containing the AXON compiler library) is an external dependency, not redistributed. Discovered viaSDK_EDGE_AI_PATHenvironment variable — same pattern as Ethos-U's dependency on Vela.Test plan
hello_axon: sin(1.57) = 0.967, 163 us @ 128 MHz, 1 AXON delegate
multi_layer: 4/4 classes correct, ~210 us, 1 delegate (chained layers)
simple_rnn: 4 RNN steps, ~690 us/step, 2 delegates (tanh breaks chain)